An Analysis of XML Binary Formats and Compression

نویسندگان

  • Christopher J. Augeri
  • Barry E. Mullins
  • Dursun A. Bulutoglu
  • Rusty O. Baldwin
  • Leemon C. Baird
چکیده

XML simplifies data exchange amongst disparate computers, but is notoriously verbose and has spawned development of a variety of XML compressors and binary formats. Some formats allow streaming access to the data without complete decompression. We present an XML test file corpus, akin to corpora such as the Canterbury corpus and a combined efficiency metric integrating compression ratio and speed. We then use the test corpus to assess 14 general-purpose and XML-specific compressors against the efficiency and other metrics. After constructing linear regression models, we identify the factors influencing compressor selection and then rank the best-performing compressors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energy Efficient Data Formatting Scheme: A Review and Analysis on XML Alternatives

A format for data representation and associated facilities can be referred as a Data Formatting Scheme (DFS). The de-facto standard format is eXtended Markup Language (XML). XML is advantageous because of its unique features such as structured code, secure, self-descriptiveness etc,. But XML has an inherent limitation due to the abundant use of tags verbosity. Several alternative proposals are ...

متن کامل

MS Word template for letter size paper

Efficient packaging and communication of metadata are critical in multimedia communications to achieve seamless mobility. The XML schema compression proposed here comprises a method for decomposing an XML schema into a sequence of atomic elements. This representation reorganizes the given XML schema with the threefold purpose of facilitating dynamic schema switching and reconfiguration of metad...

متن کامل

XML and Scientific File Formats

For many years, scientific data has been stored and transferred using a variety of data formats. In recent years XML has become an important and popular for exchanging digital information. At the most abstract level, XML can be used for any purpose that a binary format might be used, and vice versa. This paper discusses the strengths and weaknesses of XML and binary formats, and speculates on t...

متن کامل

Context- and Template-Based Compression for Efficient Management of Data Models in Resource-Constrained Systems

The Cyber Physical Systems (CPS) paradigm is based on the deployment of interconnected heterogeneous devices and systems, so interoperability is at the heart of any CPS architecture design. In this sense, the adoption of standard and generic data formats for data representation and communication, e.g., XML or JSON, effectively addresses the interoperability problem among heterogeneous systems. ...

متن کامل

Open Metadata Formats: Efficient XML-Based Communication for Heterogeneous Distributed Systems

Definition and translation of metadata is incorporated in all systems that exchange structured data. We observe that the manipulation of this metadata can be decomposed into three separate steps: discovery of the metadata, binding of program objects to the message formats represented in the metadata, and marshaling of data to and from wire formats using the metadata. We have designed a method o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007